dates =
census %>%
mutate(Month = substr(date, 1,2), Day = substr(date, 3,4), Year = substr(date,5,9))
skimr::skim_without_charts(dates)
| Name | dates |
| Number of rows | 3023 |
| Number of columns | 19 |
| _______________________ | |
| Column type frequency: | |
| character | 16 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| unique_squirrel_id | 0 | 1.00 | 13 | 14 | 0 | 3018 | 0 |
| hectare | 0 | 1.00 | 3 | 3 | 0 | 339 | 0 |
| shift | 0 | 1.00 | 2 | 2 | 0 | 2 | 0 |
| date | 0 | 1.00 | 8 | 8 | 0 | 11 | 0 |
| age | 121 | 0.96 | 1 | 8 | 0 | 3 | 0 |
| primary_fur_color | 55 | 0.98 | 4 | 8 | 0 | 3 | 0 |
| highlight_fur_color | 1086 | 0.64 | 4 | 22 | 0 | 10 | 0 |
| combination_of_primary_and_highlight_color | 0 | 1.00 | 1 | 27 | 0 | 22 | 0 |
| location | 64 | 0.98 | 12 | 12 | 0 | 2 | 0 |
| lat_long | 0 | 1.00 | 38 | 45 | 0 | 3023 | 0 |
| activity | 200 | 0.93 | 6 | 8 | 0 | 5 | 0 |
| reaction | 780 | 0.74 | 9 | 11 | 0 | 3 | 0 |
| sounds | 2884 | 0.05 | 4 | 5 | 0 | 3 | 0 |
| Month | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
| Day | 0 | 1.00 | 2 | 2 | 0 | 11 | 0 |
| Year | 0 | 1.00 | 4 | 4 | 0 | 1 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| hectare_squirrel_number | 0 | 1 | 4.12 | 3.10 | 1.00 | 2.00 | 3.00 | 6.00 | 23.00 |
| long | 0 | 1 | -73.97 | 0.01 | -73.98 | -73.97 | -73.97 | -73.96 | -73.95 |
| lat | 0 | 1 | 40.78 | 0.01 | 40.76 | 40.77 | 40.78 | 40.79 | 40.80 |
In this part, we are interested in looking at the central park squirrel census collected in October 2018. We make this graph to analyze the frequency of squirrels recorded for each of the observation (AM or PM).
dates %>%
group_by(Day, shift) %>%
count() %>%
ggplot() +
geom_col(aes(x = Day, y = n, fill = shift)) +
scale_fill_brewer(palette = "Paired") +
labs(x = "Day", y = "Number of Observations", fill = 'Shift') +
ggtitle('Central Park Squirrels Distribution by Days (AM/PM)') +
theme(plot.title = element_text(hjust = 0.5), panel.grid.minor = element_blank(),panel.grid.major = element_blank())+
labs(fill='Time of day')
The first graph we drew was ‘Number of Observations’ v.s. ‘Time of Day’, and morning and afternoon data were separated and found out that squirrels tended to be more active in the afternoon or at night time. However, the limitation of the data was that we were not able to get the exact time period of their activities but only either morning or evening, we can assume they are present prior to sunset since they should be busy collecting the food when there is sunlight.
In this part, we make this graph to analyze the frequency of squirrels recorded for each of the observation by their primary fur color.
dates %>%
group_by(Day, primary_fur_color) %>%
count() %>%
ggplot() +
geom_col(aes(x = Day, y = n, fill = primary_fur_color)) +
scale_fill_brewer(palette = "Paired") +
ggtitle('Central Park Squirrels Distribution by Primary Fur Color') +
theme(plot.title = element_text(hjust = 0.5), panel.grid.minor = element_blank(),panel.grid.major = element_blank()) +
labs( x='Day', y= 'Number of Observations') +
labs(fill='Primary Fur Color') +
scale_fill_manual(values = c("#000000", "#D2691E", "#D3D3D3", "white"))
The second graph we drew was ‘Number of Observations’ v.s. ‘Primary Fur Color’, it’s clearly shown that different number of observations were made in different days and there is no clear pattern. Squirrels were observed to be the most active on Oct.7 and Oct.13, and they clearly became less active in last few days. Generally, the gray squirrels were the most massive and black ones were the fewest. The color of cinnamon was also pretty frequently observed with some color-not-identified ones.
pie_1 =
dates %>%
filter(age !='?') %>%
filter(!is.na(age))
fig_1 =
plot_ly(data = count(pie_1, age), labels = ~age, values = ~n, type = 'pie', insidetextfont = list(color = '#FFFFFF')) %>%
layout(title = 'Central Park Squirrels Distribution by Age Group',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig_1
The third graph we drew was a pie chart indicating the distribution of squirrels by their physiological age. The majority (88.6%) of the them was adult while the remaining 11.4% was juvenile. It’s not sure how their age stage was determined by the observers, maybe by their sizes. The limitation was that only ‘adult’ and ‘juvenile’ were categorized, but the predictions might be more valid if other stages like ‘baby’ or ‘old’ were provided.
pie_2 =
dates %>%
filter(age == "Adult") %>%
filter(!is.na(primary_fur_color))
fig_2 =
plot_ly(data = count(pie_2, primary_fur_color), labels = ~primary_fur_color, values = ~n, type = 'pie', insidetextfont = list(color = '#FFFFFF')) %>%
layout(title = 'Squirrels Fur Color Distribution (Adult)',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig_2
The fourth graph we drew was to show the distribution of only adult squirrels by their primary fur color. The majority (83.6%) of the adult squirrels were gray. 12.8% of them were cinnamon, and the rest 3.62% were color of black.
pie_3 =
dates %>%
filter(age == "Juvenile") %>%
filter(!is.na(primary_fur_color))
fig_3 =
plot_ly(data = count(pie_3, primary_fur_color), labels = ~primary_fur_color, values = ~n, type = 'pie', insidetextfont = list(color = '#FFFFFF')) %>%
layout(title = 'Squirrels Fur Color Distribution (Juvenile)',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig_3
The fifth graph we drew was to show the distribution of only juvenile squirrels by their primary fur color. The distribution was similar as the adult ones. 79.5% of the juvenile squirrels were gray. 18% of them were cinnamon, and the rest 2.5% were black.
In our dataset, there are 5 different kinds of activities reported which are foraging, running, eating, climbing and chasing. We make this plot to analyze central park squirrel activity by their primary fur color.
plot_2 =
dates %>%
filter(!is.na(activity)) %>%
filter(!is.na(primary_fur_color)) %>%
group_by(primary_fur_color) %>%
count(activity, sort = TRUE)%>%
ggplot(aes(x = reorder(activity, n), y = n)) +
geom_bar(aes(fill = primary_fur_color), stat = "identity") +
scale_fill_manual(values = c("#000000", "#D2691E", "#D3D3D3", "white")) +
theme_classic() +
facet_wrap(~primary_fur_color, nrow = 1) +
labs(title = "Squirrel Activity by Primary Fur Color", y = 'Number of Observations', x = 'Activity') +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip()
plot_2
The sixth graph we drew was to show the activities in squirrels by their different primary fur colors. No matter of the fur colors, they tended to forage the most frequently and chase the least frequently, which makes sense because squirrels needed to store foods during cold months.
We make this plot to analyze central park squirrel activity by their location (above ground or ground plane).
dates %>%
filter(!is.na(activity)) %>%
filter(!is.na(location)) %>%
group_by(activity) %>%
mutate(n=1) %>%
ggplot() +
geom_col(aes(y=n,x = activity, fill = location), position="fill") +
ggtitle('Central Park Squirrels Acitivities by Location') +
theme(plot.title = element_text(hjust = 0.5), panel.grid.minor = element_blank(),panel.grid.major = element_blank())+ labs( x = 'Activity', y= 'Proportion') +
labs(fill = 'Location')
The last graph we drew was to show how distributions of different activities differ by the locations. For example, climbing happened above ground most of the time, but foraging, running and eating basically happened on ground plane. Activities like chasing has equal probabilities of happening both above ground and on ground plane.
We build this overall map to visualize the distribution of each squirrel in central park by their different primary fur color clearly.
pal_coats <- colorFactor(c("#000000", "#D2691E", "#D3D3D3", "white"), domain = c("Black", "Cinnamon", "Grey", "NA"))
#Locations based on fur color
census %>%
filter(!is.na(primary_fur_color)) %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(lng = ~long,
lat = ~lat, radius = 3, color = ~pal_coats(primary_fur_color), stroke = FALSE, fillOpacity = 0.5) %>%
addLegend(position = "topright",pal = pal_coats, values = ~primary_fur_color)
If you would like to track squirrels by their several characteristics (fur color, age, time of day, interaction and activity), we have built an AWESOME interactive app, named “Central Park Squirrel Tracker”.